Skip to content

fix: enforce WAL writer fencing via conditional PUT#6432

Open
hamersaw wants to merge 1 commit intolance-format:mainfrom
hamersaw:bug/wal-fencing
Open

fix: enforce WAL writer fencing via conditional PUT#6432
hamersaw wants to merge 1 commit intolance-format:mainfrom
hamersaw:bug/wal-fencing

Conversation

@hamersaw
Copy link
Copy Markdown
Contributor

@hamersaw hamersaw commented Apr 7, 2026

Summary

  • WAL fencing was broken: a fenced writer could keep writing WAL entries indefinitely because flush_to_with_index_update used plain put() with no fencing check. Two writers with the same shard ID silently overwrote each other's WAL entries.
  • WAL entries now use PutMode::Create (put-if-not-exists) — same cost as regular PUT on S3 but fails atomically if another writer already claimed that position.
  • New write_fence_barrier() at startup writes an empty WAL entry to claim the next position, scanning forward past any stale entries from old writers. This immediately fences zombie writers on open.

Test plan

  • 9 new fencing tests in wal::tests (conditional PUT rejection, fence barrier claiming, barrier skip-forward, barrier blocking old writer, sequential position uniqueness)
  • 8 new fencing tests in manifest::tests (epoch claiming, check_fenced edge cases, commit_update rejection, three-writer scenario)
  • All 12 existing write::tests pass (including E2E correctness test with fence barrier in startup path)
  • Zero clippy warnings

🤖 Generated with Claude Code

WAL fencing was broken — a fenced writer could keep writing WAL entries
indefinitely because flush_to_with_index_update used plain put() with no
fencing check. Two writers with the same shard ID would silently overwrite
each other's WAL entries since both started from the same manifest position.

Replace with conditional PUT fencing:
- WAL entries now use PutMode::Create (put-if-not-exists), which costs the
  same as regular PUT on S3 (If-None-Match header) but fails atomically if
  another writer already claimed that position
- New write_fence_barrier() at startup writes an empty WAL entry to claim
  the next position, scanning forward past any stale entries from old writers
- Removes manifest read from WAL flush hot path (was 4-5 round trips per flush)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added the bug Something isn't working label Apr 7, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 7, 2026

Codecov Report

❌ Patch coverage is 92.91045% with 19 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/dataset/mem_wal/wal.rs 90.06% 10 Missing and 5 partials ⚠️
rust/lance/src/dataset/mem_wal/manifest.rs 97.41% 3 Missing ⚠️
rust/lance/src/dataset/mem_wal/write.rs 0.00% 0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@hamersaw
Copy link
Copy Markdown
Contributor Author

hamersaw commented Apr 8, 2026

Verification that fencing was broken on main

To confirm this fix addresses a real bug, I added test_wal_flush_rejects_duplicate_position to main (without the fix) and ran it. The test creates two WalFlusher instances with the same shard ID starting at the same WAL position — simulating two writers that read the same stale manifest.

On main: test FAILS — both writers succeed because put() is an unconditional overwrite:

Writer B should be fenced out (rejected) when writing to the same WAL position,
but it succeeded — fencing is broken! Plain put() silently overwrites the file.
test result: FAILED. 0 passed; 1 failed;

On this branch: test PASSES — the second writer is rejected because put_opts(..., PutMode::Create) fails atomically when the file already exists.

Root cause: flush_to_with_index_update used store.inner.put() with no conditional write semantics, so any writer could silently overwrite another writer's WAL entry at the same position.

Copy link
Copy Markdown
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a "put not exists" actually free? I thought object stores typically had a much lower throughput for conditional operations (e.g. 10 per second) than unconditional operations.

Comment on lines +505 to +508
// Build an empty WAL entry (schema + epoch metadata, no batches)
let mut metadata = schema.metadata().clone();
metadata.insert(WRITER_EPOCH_KEY.to_string(), self.writer_epoch.to_string());
let schema_with_epoch = ArrowSchema::new_with_metadata(schema.fields().to_vec(), metadata);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we confirmed these fence barriers don't bother the read half?

Comment on lines +565 to +566
"Writer fenced: WAL position {} already exists for shard {} \
(another writer has claimed this position)",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the fenced writer do if it's not automatically retrying?

Comment on lines +1016 to +1019
// Write fence barrier to claim WAL positions and fence zombie writers.
// Uses PutMode::Create to atomically claim the next WAL position,
// scanning forward past any entries left by old (fenced) writers.
wal_flusher.write_fence_barrier(&schema).await?;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's slightly counter intuitive that a method named open will actually do a write (e.g. isn't idempotent). Can we document this in the method docs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants